Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 338
1.
PLoS One ; 19(5): e0295971, 2024.
Article En | MEDLINE | ID: mdl-38709794

The human genome is pervasively transcribed and produces a wide variety of long non-coding RNAs (lncRNAs), constituting the majority of transcripts across human cell types. Some specific nuclear lncRNAs have been shown to be important regulatory components acting locally. As RNA-chromatin interaction and Hi-C chromatin conformation data showed that chromatin interactions of nuclear lncRNAs are determined by the local chromatin 3D conformation, we used Hi-C data to identify potential target genes of lncRNAs. RNA-protein interaction data suggested that nuclear lncRNAs act as scaffolds to recruit regulatory proteins to target promoters and enhancers. Nuclear lncRNAs may therefore play a role in directing regulatory factors to locations spatially close to the lncRNA gene. We provide the analysis results through an interactive visualization web portal at https://fantom.gsc.riken.jp/zenbu/reports/#F6_3D_lncRNA.


Chromatin , RNA, Long Noncoding , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Chromatin/metabolism , Chromatin/genetics , Humans , Molecular Sequence Annotation , Cell Nucleus/metabolism , Cell Nucleus/genetics , Genome, Human , Promoter Regions, Genetic
2.
Nat Commun ; 15(1): 1400, 2024 Feb 21.
Article En | MEDLINE | ID: mdl-38383605

RNA structure folding largely influences RNA regulation by providing flexibility and functional diversity. In silico and in vitro analyses are limited in their ability to capture the intricate relationships between dynamic RNA structure and RNA functional diversity present in the cell. Here, we investigate sequence, structure and functional features of mouse and human SINE-transcribed retrotransposons embedded in SINEUPs long non-coding RNAs, which positively regulate target gene expression post-transcriptionally. In-cell secondary structure probing reveals that functional SINEs-derived RNAs contain conserved short structure motifs essential for SINEUP-induced translation enhancement. We show that SINE RNA structure dynamically changes between the nucleus and cytoplasm and is associated with compartment-specific binding to RBP and related functions. Moreover, RNA-RNA interaction analysis shows that the SINE-derived RNAs interact directly with ribosomal RNAs, suggesting a mechanism of translation regulation. We further predict the architecture of 18 SINE RNAs in three dimensions guided by experimental secondary structure data. Overall, we demonstrate that the conservation of short key features involved in interactions with RBPs and ribosomal RNA drives the convergent function of evolutionarily distant SINE-transcribed RNAs.


RNA, Long Noncoding , Short Interspersed Nucleotide Elements , Humans , RNA, Messenger/metabolism , Short Interspersed Nucleotide Elements/genetics , Gene Expression Regulation , RNA, Untranslated/genetics , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism
3.
Nature ; 622(7981): 41-47, 2023 Oct.
Article En | MEDLINE | ID: mdl-37794265

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Genes , Genome, Human , Molecular Sequence Annotation , Protein Isoforms , Humans , Genome, Human/genetics , Molecular Sequence Annotation/standards , Molecular Sequence Annotation/trends , Protein Isoforms/genetics , Human Genome Project , Pseudogenes , RNA/genetics
4.
NAR Genom Bioinform ; 5(3): lqad075, 2023 Sep.
Article En | MEDLINE | ID: mdl-37608799

In the genomic era, data dissemination and visualization is an integral part of scientific publications and research projects involving international consortia producing massive genome-wide data sets, intra-organizational collaborations, or individual labs. However, creating custom supporting websites is oftentimes impractical due to the required programming effort, web server infrastructure, and data storage facilities, as well as the long-term maintenance burden. ZENBU-Reports (https://fantom.gsc.riken.jp/zenbu/reports) is a web application to create interactive scientific web portals by using graphical interfaces while providing storage and secured collaborative sharing for data uploaded by users. ZENBU-Reports provides the scientific visualization elements commonly used in supplementary websites, publications and presentations, presenting a complete solution for the interactive display and dissemination of data and analysis results during the full lifespan of a scientific project both during the active research phase and after publication of the results.

5.
Nature ; 621(7978): 389-395, 2023 Sep.
Article En | MEDLINE | ID: mdl-37648852

Insulin resistance is the primary pathophysiology underlying metabolic syndrome and type 2 diabetes1,2. Previous metagenomic studies have described the characteristics of gut microbiota and their roles in metabolizing major nutrients in insulin resistance3-9. In particular, carbohydrate metabolism of commensals has been proposed to contribute up to 10% of the host's overall energy extraction10, thereby playing a role in the pathogenesis of obesity and prediabetes3,4,6. Nevertheless, the underlying mechanism remains unclear. Here we investigate this relationship using a comprehensive multi-omics strategy in humans. We combine unbiased faecal metabolomics with metagenomics, host metabolomics and transcriptomics data to profile the involvement of the microbiome in insulin resistance. These data reveal that faecal carbohydrates, particularly host-accessible monosaccharides, are increased in individuals with insulin resistance and are associated with microbial carbohydrate metabolisms and host inflammatory cytokines. We identify gut bacteria associated with insulin resistance and insulin sensitivity that show a distinct pattern of carbohydrate metabolism, and demonstrate that insulin-sensitivity-associated bacteria ameliorate host phenotypes of insulin resistance in a mouse model. Our study, which provides a comprehensive view of the host-microorganism relationships in insulin resistance, reveals the impact of carbohydrate metabolism by microbiota, suggesting a potential therapeutic target for ameliorating insulin resistance.


Carbohydrate Metabolism , Gastrointestinal Microbiome , Insulin Resistance , Animals , Humans , Mice , Diabetes Mellitus, Type 2/metabolism , Gastrointestinal Microbiome/physiology , Insulin Resistance/physiology , Monosaccharides/metabolism , Insulin/metabolism , Metabolic Syndrome/metabolism , Feces/chemistry , Feces/microbiology , Metabolomics
6.
bioRxiv ; 2023 Jun 18.
Article En | MEDLINE | ID: mdl-37398314

Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we developed CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5'capped, full-length transcripts, together with the data processing pipeline LyRic. We benchmarked CapTrap-seq and other popular RNA-seq library preparation protocols in a number of human tissues using both ONT and PacBio sequencing. To assess the accuracy of the transcript models produced, we introduced a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5'cap formation in RNA spike-in molecules. We found that the vast majority (up to 90%) of transcript models that LyRic derives from CapTrap-seq reads are full-length. This makes it possible to produce highly accurate annotations with minimal human intervention.

7.
Mol Ther Nucleic Acids ; 32: 402-414, 2023 Jun 13.
Article En | MEDLINE | ID: mdl-37187707

SINEUPs are natural and synthetic antisense long non-coding RNAs (lncRNAs) selectively enhancing target mRNAs translation by increasing their association with polysomes. This activity requires two RNA domains: an embedded inverted SINEB2 element acting as effector domain, and an antisense region, the binding domain, conferring target selectivity. SINEUP technology presents several advantages to treat genetic (haploinsufficiencies) and complex diseases restoring the physiological activity of diseased genes and of compensatory pathways. To streamline these applications to the clinic, a better understanding of the mechanism of action is needed. Here we show that natural mouse SINEUP AS Uchl1 and synthetic human miniSINEUP-DJ-1 are N6-methyladenosine (m6A) modified by METTL3 enzyme. Then, we map m6A-modified sites along SINEUP sequence with Nanopore direct RNA sequencing and a reverse transcription assay. We report that m6A removal from SINEUP RNA causes the depletion of endogenous target mRNA from actively translating polysomes, without altering SINEUP enrichment in ribosomal subunit-associated fractions. These results prove that SINEUP activity requires an m6A-dependent step to enhance translation of target mRNAs, providing a new mechanism for m6A translation regulation and strengthening our knowledge of SINEUP-specific mode of action. Altogether these new findings pave the way to a more effective therapeutic application of this well-defined class of lncRNAs.

8.
ArXiv ; 2023 Mar 24.
Article En | MEDLINE | ID: mdl-36994150

Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants.

10.
Nat Rev Mol Cell Biol ; 24(6): 430-447, 2023 06.
Article En | MEDLINE | ID: mdl-36596869

Genes specifying long non-coding RNAs (lncRNAs) occupy a large fraction of the genomes of complex organisms. The term 'lncRNAs' encompasses RNA polymerase I (Pol I), Pol II and Pol III transcribed RNAs, and RNAs from processed introns. The various functions of lncRNAs and their many isoforms and interleaved relationships with other genes make lncRNA classification and annotation difficult. Most lncRNAs evolve more rapidly than protein-coding sequences, are cell type specific and regulate many aspects of cell differentiation and development and other physiological processes. Many lncRNAs associate with chromatin-modifying complexes, are transcribed from enhancers and nucleate phase separation of nuclear condensates and domains, indicating an intimate link between lncRNA expression and the spatial control of gene expression during development. lncRNAs also have important roles in the cytoplasm and beyond, including in the regulation of translation, metabolism and signalling. lncRNAs often have a modular structure and are rich in repeats, which are increasingly being shown to be relevant to their function. In this Consensus Statement, we address the definition and nomenclature of lncRNAs and their conservation, expression, phenotypic visibility, structure and functions. We also discuss research challenges and provide recommendations to advance the understanding of the roles of lncRNAs in development, cell biology and disease.


RNA, Long Noncoding , RNA, Long Noncoding/genetics , Cell Nucleus/genetics , Chromatin/genetics , Regulatory Sequences, Nucleic Acid , RNA Polymerase II/genetics
11.
Nat Biomed Eng ; 7(6): 830-844, 2023 06.
Article En | MEDLINE | ID: mdl-36411359

Gene transcription is regulated through complex mechanisms involving non-coding RNAs (ncRNAs). As the transcription of ncRNAs, especially of enhancer RNAs, is often low and cell type specific, how the levels of RNA transcription depend on genotype remains largely unexplored. Here we report the development and utility of a machine-learning model (MENTR) that reliably links genome sequence and ncRNA expression at the cell type level. Effects on ncRNA transcription predicted by the model were concordant with estimates from published studies in a cell-type-dependent manner, regardless of allele frequency and genetic linkage. Among 41,223 variants from genome-wide association studies, the model identified 7,775 enhancer RNAs and 3,548 long ncRNAs causally associated with complex traits across 348 major human primary cells and tissues, such as rare variants plausibly altering the transcription of enhancer RNAs to influence the risks of Crohn's disease and asthma. The model may aid the discovery of causal variants and the generation of testable hypotheses for biological mechanisms driving complex traits.


Genome-Wide Association Study , RNA, Untranslated , Humans , RNA, Untranslated/genetics , Transcription, Genetic/genetics , Genome
12.
EMBO Rep ; 24(2): e53801, 2023 02 06.
Article En | MEDLINE | ID: mdl-36472244

Adult neural progenitor cells (aNPCs) ensure lifelong neurogenesis in the mammalian hippocampus. Proper regulation of aNPC fate has thus important implications for brain plasticity and healthy aging. Piwi proteins and the small noncoding RNAs interacting with them (piRNAs) have been proposed to control memory and anxiety, but the mechanism remains elusive. Here, we show that Piwil2 (Mili) is essential for proper neurogenesis in the postnatal mouse hippocampus. RNA sequencing of aNPCs and their differentiated progeny reveal that Mili and piRNAs are dynamically expressed in neurogenesis. Depletion of Mili and piRNAs in the adult hippocampus impairs aNPC differentiation toward a neural fate, induces senescence, and generates reactive glia. Transcripts modulated upon Mili depletion bear sequences complementary or homologous to piRNAs and include repetitive elements and mRNAs encoding essential proteins for proper neurogenesis. Our results provide evidence of a critical role for Mili in maintaining fitness and proper fate of aNPCs, underpinning a possible involvement of the piRNA pathway in brain plasticity and successful aging.


Argonaute Proteins , Hippocampus , Neurogenesis , Animals , Mice , Argonaute Proteins/genetics , Argonaute Proteins/metabolism , Cellular Senescence/genetics , Hippocampus/metabolism , Mammals/genetics , Mammals/metabolism , Neurogenesis/genetics , RNA, Small Interfering/genetics , RNA, Small Interfering/metabolism
13.
Cell Rep ; 41(13): 111893, 2022 12 27.
Article En | MEDLINE | ID: mdl-36577377

Within the scope of the FANTOM6 consortium, we perform a large-scale knockdown of 200 long non-coding RNAs (lncRNAs) in human induced pluripotent stem cells (iPSCs) and systematically characterize their roles in self-renewal and pluripotency. We find 36 lncRNAs (18%) exhibiting cell growth inhibition. From the knockdown of 123 lncRNAs with transcriptome profiling, 36 lncRNAs (29.3%) show molecular phenotypes. Integrating the molecular phenotypes with chromatin-interaction assays further reveals cis- and trans-interacting partners as potential primary targets. Additionally, cell-type enrichment analysis identifies lncRNAs associated with pluripotency, while the knockdown of LINC02595, CATG00000090305.1, and RP11-148B6.2 modulates colony formation of iPSCs. We compare our results with previously published fibroblasts phenotyping data and find that 2.9% of the lncRNAs exhibit a consistent cell growth phenotype, whereas we observe 58.3% agreement in molecular phenotypes. This highlights that molecular phenotyping is more comprehensive in revealing affected pathways.


Induced Pluripotent Stem Cells , RNA, Long Noncoding , Humans , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Induced Pluripotent Stem Cells/metabolism , Oligonucleotides, Antisense , Gene Expression Profiling/methods , Embryonic Stem Cells/metabolism
14.
Nat Genet ; 54(11): 1675-1689, 2022 11.
Article En | MEDLINE | ID: mdl-36333502

The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom's 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia.


Leukemia, Lymphocytic, Chronic, B-Cell , Humans , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , Whole Genome Sequencing , Mutation , Genomics , Prognosis
15.
Sci Adv ; 8(36): eabo3192, 2022 09 09.
Article En | MEDLINE | ID: mdl-36070371

Mechanistic insights into the molecular events by which exercise enhances the skeletal muscle phenotype are lacking, particularly in the context of type 2 diabetes. Here, we unravel a fundamental role for exercise-responsive cytokines (exerkines) on skeletal muscle development and growth in individuals with normal glucose tolerance or type 2 diabetes. Acute exercise triggered an inflammatory response in skeletal muscle, concomitant with an infiltration of immune cells. These exercise effects were potentiated in type 2 diabetes. In response to contraction or hypoxia, cytokines were mainly produced by endothelial cells and macrophages. The chemokine CXCL12 was induced by hypoxia in endothelial cells, as well as by conditioned medium from contracted myotubes in macrophages. We found that CXCL12 was associated with skeletal muscle remodeling after exercise and differentiation of cultured muscle. Collectively, acute aerobic exercise mounts a noncanonical inflammatory response, with an atypical production of exerkines, which is potentiated in type 2 diabetes.


Diabetes Mellitus, Type 2 , Exercise , Inflammation , Chemokine CXCL12 , Cytokines , Endothelial Cells , Humans , Hypoxia , Muscle, Skeletal/physiology
16.
Bioinformatics ; 38(22): 5126-5128, 2022 11 15.
Article En | MEDLINE | ID: mdl-36173306

MOTIVATION: Cell type-specific activities of cis-regulatory elements (CRE) are central to understanding gene regulation and disease predisposition. Single-cell RNA 5'end sequencing (sc-end5-seq) captures the transcription start sites (TSS) which can be used as a proxy to measure the activity of transcribed CREs (tCREs). However, a substantial fraction of TSS identified from sc-end5-seq data may not be genuine due to various artifacts, hindering the use of sc-end5-seq for de novo discovery of tCREs. RESULTS: We developed SCAFE-Single-Cell Analysis of Five-prime Ends-a software suite that processes sc-end5-seq data to de novo identify TSS clusters based on multiple logistic regression. It annotates tCREs based on the identified TSS clusters and generates a tCRE-by-cell count matrix for downstream analyses. The software suite consists of a set of flexible tools that could either be run independently or as pre-configured workflows. AVAILABILITY AND IMPLEMENTATION: SCAFE is implemented in Perl and R. The source code and documentation are freely available for download under the MIT License from https://github.com/chung-lab/SCAFE. Docker images are available from https://hub.docker.com/r/cchon/scafe. The submitted software version and test data are archived at https://doi.org/10.5281/zenodo.7023163 and https://doi.org/10.5281/zenodo.7024060, respectively. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Regulatory Sequences, Nucleic Acid , Software , Workflow , Transcription Initiation Site
17.
Genome Res ; 2022 Aug 12.
Article En | MEDLINE | ID: mdl-35961773

In eukaryotes, capped RNAs include long transcripts such as messenger RNAs and long noncoding RNAs, as well as shorter transcripts such as spliceosomal RNAs, small nucleolar RNAs, and enhancer RNAs. Long capped transcripts can be profiled using cap analysis gene expression (CAGE) sequencing and other methods. Here, we describe a sequencing library preparation protocol for short capped RNAs, apply it to a differentiation time course of the human cell line THP-1, and systematically compare the landscape of short capped RNAs to that of long capped RNAs. Transcription initiation peaks associated with genes in the sense direction have a strong preference to produce either long or short capped RNAs, with one out of six peaks detected in the short capped RNA libraries only. Gene-associated short capped RNAs have highly specific 3' ends, typically overlapping splice sites. Enhancers also preferentially generate either short or long capped RNAs, with 10% of enhancers observed in the short capped RNA libraries only. Enhancers producing either short or long capped RNAs show enrichment for GWAS-associated disease SNPs. We conclude that deep sequencing of short capped RNAs reveals new families of noncoding RNAs and elucidates the diversity of transcripts generated at known and novel promoters and enhancers.

18.
Cell ; 185(16): 3025-3040.e6, 2022 08 04.
Article En | MEDLINE | ID: mdl-35882231

Non-allelic recombination between homologous repetitive elements contributes to evolution and human genetic disorders. Here, we combine short- and long-DNA read sequencing of repeat elements with a new bioinformatics pipeline to show that somatic recombination of Alu and L1 elements is widespread in the human genome. Our analysis uncovers tissue-specific non-allelic homologous recombination hallmarks; moreover, we find that centromeres and cancer-associated genes are enriched for retroelements that may act as recombination hotspots. We compare recombination profiles in human-induced pluripotent stem cells and differentiated neurons and find that the neuron-specific recombination of repeat elements accompanies chromatin changes during cell-fate determination. Finally, we report that somatic recombination profiles are altered in Parkinson's and Alzheimer's disease, suggesting a link between retroelement recombination and genomic instability in neurodegeneration. This work highlights a significant contribution of the somatic recombination of repeat elements to genomic diversity in health and disease.


Genome, Human , Retroelements , Alu Elements/genetics , Homologous Recombination , Humans , Long Interspersed Nucleotide Elements , Repetitive Sequences, Nucleic Acid
19.
Nat Genet ; 54(7): 1037-1050, 2022 07.
Article En | MEDLINE | ID: mdl-35789323

Zebrafish, a popular organism for studying embryonic development and for modeling human diseases, has so far lacked a systematic functional annotation program akin to those in other animal models. To address this, we formed the international DANIO-CODE consortium and created a central repository to store and process zebrafish developmental functional genomic data. Our data coordination center ( https://danio-code.zfin.org ) combines a total of 1,802 sets of unpublished and re-analyzed published genomic data, which we used to improve existing annotations and show its utility in experimental design. We identified over 140,000 cis-regulatory elements throughout development, including classes with distinct features dependent on their activity in time and space. We delineated the distinct distance topology and chromatin features between regulatory elements active during zygotic genome activation and those active during organogenesis. Finally, we matched regulatory elements and epigenomic landscapes between zebrafish and mouse and predicted functional relationships between them beyond sequence similarity, thus extending the utility of zebrafish developmental genomics to mammals.


Databases, Genetic , Gene Expression Regulation, Developmental , Genome , Genomics , Regulatory Sequences, Nucleic Acid , Zebrafish Proteins , Zebrafish , Animals , Chromatin/genetics , Genome/genetics , Humans , Mice , Molecular Sequence Annotation , Organogenesis/genetics , Regulatory Sequences, Nucleic Acid/genetics , Zebrafish/embryology , Zebrafish/genetics , Zebrafish Proteins/genetics
...